智能论文笔记

Convergence Rate Analysis for Optimal Computing Budget Allocation Algorithms

Yanwen Li , Siyang Gao

分类： (统计)机器学习 | 机器学习

2022-11-27

Ordinal optimization (OO) is a widely-studied technique for optimizing discrete-event dynamic systems (DEDS). It evaluates the performance of the system designs in a finite set by sampling and aims to correctly make ordinal comparison of the designs. A well-known method in OO is the optimal computing budget allocation (OCBA). It builds the optimality conditions for the number of samples allocated to each design, and the sample allocation that satisfies the optimality conditions is shown to asymptotically maximize the probability of correct selection for the best design. In this paper, we investigate two popular OCBA algorithms. With known variances for samples of each design, we characterize their convergence rates with respect to different performance measures. We first demonstrate that the two OCBA algorithms achieve the optimal convergence rate under measures of probability of correct selection and expected opportunity cost. It fills the void of convergence analysis for OCBA algorithms. Next, we extend our analysis to the measure of cumulative regret, a main measure studied in the field of machine learning. We show that with minor modification, the two OCBA algorithms can reach the optimal convergence rate under cumulative regret. It indicates the potential of broader use of algorithms designed based on the OCBA optimality conditions.

translated by 谷歌翻译

On the Finite-Time Performance of the Knowledge Gradient Algorithm

Yanwen Li , Siyang Gao

分类： (统计)机器学习 | 机器学习

2022-06-14

知识梯度（KG）算法是最佳手臂识别（BAI）问题的流行且有效的算法。由于KG的复杂计算，该算法的理论分析很困难，现有结果主要是关于IT的渐近性能，例如一致性，渐近样本分配等。在这项研究中，我们提供了有关有限的新理论结果。 - KG算法的时间性能。在独立和正常分布的奖励下，我们得出了下限和上限，以使算法的错误和简单的遗憾。通过这些界限，现有的渐近结果变成了简单的推论。我们还显示了多臂强盗（MAB）问题的算法的性能。这些发展不仅扩展了KG算法的现有分析，而且还可以用于分析其他基于改进的算法。最后，我们使用数值实验进一步证明了KG算法的有限时间行为。

translated by 谷歌翻译

UTOPIC: Uncertainty-aware Overlap Prediction Network for Partial Point Cloud Registration

Zhilei Chen , Honghua Chen , Lina Gong , Xuefeng Yan , Jun Wang , Yanwen Guo , Jing Qin , Mingqiang Wei

分类：计算机视觉

2022-08-04

高信心重叠的预测和准确的对应关系对于以部分到派对方式对齐成对点云至关重要。但是，重叠区域和非重叠区域之间存在固有的不确定性，这些区域一直被忽略并显着影响注册绩效。除了当前的智慧之外，我们提出了一种新颖的不确定性意识到的重叠预测网络，称为Utopic，以解决模棱两可的重叠预测问题。据我们所知，这是第一个明确引入重叠不确定性以指向云注册的人。此外，我们诱导特征提取器通过完成解码器隐式感知形状知识，并为变压器提供几何关系嵌入，以获得转换 - 不变性的几何形状感知特征表示。凭借更可靠的重叠得分和更精确的密度对应关系的优点，即使对于有限的重叠区域的输入，乌托邦也可以实现稳定而准确的注册结果。关于合成和实际基准的广泛定量和定性实验证明了我们的方法优于最先进的方法。代码可从https://github.com/zhileichen99/utopic获得。

translated by 谷歌翻译

GeoSegNet: Point Cloud Semantic Segmentation via Geometric Encoder-Decoder Modeling

Chen Chen , Yisen Wang , Honghua Chen , Xuefeng Yan , Dayong Ren , Yanwen Guo , Haoran Xie , Fu Lee Wang , Mingqiang Wei

分类：计算机视觉

2022-07-14

点云的语义分割，旨在为每个点分配语义类别，对3D场景的理解至关重要。尽管近年来取得了重大进展，但大多数现有方法仍然遭受对象级别的错误分类或边界级别的歧义。在本文中，我们通过深入探索被称为Geosegnet的点云的几何形状来提出一个强大的语义分割网络。我们的Geosegnet由一个基于多几何的编码器和边界引导的解码器组成。在编码器中，我们从多几何的角度开发了一个新的残差几何模块，以提取对象级特征。在解码器中，我们引入了一个对比边界学习模块，以增强边界点的几何表示。从几何编码器模型中受益，我们的GEOSEGNET可以在使两个或多个对象的相交（边界）清晰地确定对象的分割。从总体分割精度和对象边界清除方面，实验显示了我们方法对竞争对手的明显改善。代码可在https://github.com/chen-yuiyui/geosegnet上找到。

translated by 谷歌翻译

AGConv: Adaptive Graph Convolution on 3D Point Clouds

Mingqiang Wei , Zeyong Wei , Haoran Zhou , Fei Hu , Huajian Si , Zhilei Chen , Zhe Zhu , Jingbo Qiu , Xuefeng Yan , Yanwen Guo

分类：计算机视觉

2022-06-09

3D点云的卷积经过广泛研究，但在几何深度学习中却远非完美。卷积的传统智慧在3D点之间表现出特征对应关系，这是对差的独特特征学习的内在限制。在本文中，我们提出了自适应图卷积（AGCONV），以供点云分析的广泛应用。 AGCONV根据其动态学习的功能生成自适应核。与使用固定/各向同性核的解决方案相比，AGCONV提高了点云卷积的灵活性，有效，精确地捕获了不同语义部位的点之间的不同关系。与流行的注意力体重方案不同，AGCONV实现了卷积操作内部的适应性，而不是简单地将不同的权重分配给相邻点。广泛的评估清楚地表明，我们的方法优于各种基准数据集中的点云分类和分割的最新方法。同时，AGCONV可以灵活地采用更多的点云分析方法来提高其性能。为了验证其灵活性和有效性，我们探索了基于AGCONV的完成，DeNoing，Upsmpling，注册和圆圈提取的范式，它们与竞争对手相当甚至优越。我们的代码可在https://github.com/hrzhou2/adaptconv-master上找到。

translated by 谷歌翻译

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Yang Zhang , Yang Yang , Chenyun Xiong , Guodong Sun , Yanwen Guo

分类：计算机视觉

2022-01-05

编码器 - 解码器模型已广泛用于RGBD语义分割，并且大多数通过双流网络设计。通常，共同推理RGBD的颜色和几何信息是有益的对语义分割。然而，大多数现有方法都无法全面地利用编码器和解码器中的多模式信息。在本文中，我们提出了一种用于RGBD语义细分的新型关注的双重监督解码器。在编码器中，我们设计一个简单但有效的关注的多模式融合模块，以提取和保险丝深度多级成对的互补信息。要了解更强大的深度表示和丰富的多模态信息，我们介绍了一个双分支解码器，以有效利用不同任务的相关性和互补线。在Nyudv2和Sun-RGBD数据集上的广泛实验表明，我们的方法达到了最先进的方法的卓越性能。

translated by 谷歌翻译

Temporal Alignment Prediction for Few-Shot Video Classification

Fei Pan , Chunlei Xu , Jie Guo , Yanwen Guo

分类：计算机视觉

2021-07-26

几次视频分类的目标是在仅用少数标记的视频训练时学习具有良好概率能力的分类模型。但是，很难在这样的环境中学习视频的判别特征表示。在本文中，我们基于序列相似度学习对几次拍摄视频分类提出时间对准预测（Tap）。为了获得一对视频的相似性，我们预测具有时间对准预测功能的两个视频中的所有时间位置对的对准分数。此外，此功能的输入还配备了时间域中的上下文信息。我们评估在两个视频分类基准上的点击，包括动力学和某事物V2。实验结果验证了龙头的有效性，并显示出其优于最先进的方法。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译